Sifting Truths from Multiple Low-Quality Data Sources

نویسندگان

  • Zizhe Xie
  • Qizhi Liu
  • Zhifeng Bao
چکیده

In this paper, we study the problem of assessing the quality of co-reference tuples extracted from multiple low-quality data sources and finding true values from them. It is a critical part of an effective data integration solution. In order to solve this problem, we first propose a model to specify the tuple quality. Then we present a framework to infer the tuple quality based on the concept of quality predicates. In particular, we propose an algorithm underlying the framework to find true values for each attribute. Last, we have conducted extensive experiments on real-life data to verify the effectiveness and efficiency of our methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovering Multiple Truths with a Hybrid Model

Many data management applications require integrating information from multiple sources. The sources may not be accurate and provide erroneous values. We thus have to identify the true values from conflicting observations made by the sources. The problem is further complicated when there may exist multiple truths (e.g., a book written by several authors). In this paper we propose a model called...

متن کامل

Lower Bound Sifting for MDDs

Decision Diagrams (DDs) are a data structure for the representation and manipulation of discrete logic functions often applied in VLSI CAD. Common DDs to represent Boolean functions are Binary Decision Diagrams (BDDs). Multiple-valued logic functions can be represented by Multiple-valued Decision Diagrams (MDDs). The efficiency of a DD representation strongly depends on the variable ordering; t...

متن کامل

Domain-Aware Multi-Truth Discovery from Conflicting Sources

In the Big Data era, truth discovery has served as a promising technique to solve conflicts in the facts provided by numerous data sources. The most significant challenge for this task is to estimate source reliability and select the answers supported by high quality sources. However, existing works assume that one data source has the same reliability on any kinds of entity, ignoring the possib...

متن کامل

Evaluation of Image Segmentation Quality by Adaptive Ground Truth Composition

Segmenting an image is an important step in many computer vision applications. However, image segmentation evaluation is far from being well-studied in contrast to the extensive studies on image segmentation algorithms. In this paper, we propose a framework to quantitatively evaluate the quality of a given segmentation with multiple ground truth segmentations. Instead of comparing directly the ...

متن کامل

Augmented Sifting of Multiple-Valued Decision Diagrams

Discrete functions are now commonly represented by binary (BDD) and multiple-valued (MDD) decision diagrams. Sifting is an effective heuristic technique which applies adjacent variable interchanges to find a good variable ordering to reduce the size of a BDD or MDD. Linear sifting is an extension of BDD sifting where XOR operations involving adjacent variable pairs augment adjacent variable int...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017